This report provides an evaluation of the accuracy and precision of probabilistic forecasts submitted to the COVID-19 Forecast Hub over the last 5 weeks. The forecasts evaluated were submitted during the time period from November 17, 2020 through December 21, 2020. The revision dates of this data was calculated as of 2020-12-29.
In this weekly report we are evaluating forecasts made in 57 different locations (US on a national level, 50 states, and 6 territories), for 4 horizons over 5 submission weeks. We are evaluating 3 targets including incident cases, incident deaths, and cumulative deaths.
In collaboration with the US CDC, our team collects COVID-19 forecasts from dozens of teams around the globe. Each Monday evening or Tuesday morning, we combine the most recent forecasts from each team into a single "ensemble" forecast for each of the target submissions.
Typically on Wednesday or Thursday of each week, a summary of the week's forecast from the COVID-19 Forecast Hub, including the ensemble forecast, appear on the official CDC COVID-19 forecasting page.
This figure shows the number of incident cases reported each week. The period between the vertical lines shows the number of weeks for which models were evaluated
This figure shows the number of locations each model subimitted a forecast for for incident cases. This report includes a maximum of 57 locations including all 50 states, a National level forecast, and 6 US territories. The number of models who submitted forecasts for incident cases is 37.
This table shows the performance of each model based on their interval coverage, relative WIS, and relative MAE. The data in this figure is aggregated across all submission weeks, locations, and horizons. Well calibrated models should have a 50% coverage level of 0.5 and a 95% coverage level of 0.95. The relative WIS and relative MAE scores are calculated based on a pairwise comparison developed by Johannes Bracher. The code for this comparison can be found here.
Next, we have compared scored by submission week. These values are aggregated across all 50 states and a National level forecast.
In this figure, the dotted black line represents the average 1 week ahead error. There is larger variation in error for the 4 week horizon compared to the 1 week horizon.
The following figure shows the scores of models aggregated by horizon and submission week. In this figure, we have only included models that have submitted forecasts for all 4 horizons and all submission weeks evaluated.
In the 5 week evaluation period, the evaluated Saturdays are 2020-11-28 through 2020-12-26. models submitted incident death forecasts. The number of models who submitted forecasts for incident deaths is 49. The number of models that submitted forecasts for all 5 was 45. The number of teams that submitted forecasts for all locations was 10.
The figure below shows the number of locations that each model submitted forecasts for during this evaluation period. The models that are eligible for evaluation based on number of weeks submitted and number of targets for each week are bolded. The dates listed on the X axis are the Saturday before the first horizon. This is the Saturday associated with the target submission week. If a model is submitted on a Tuesday - Friday, the Saturday listed occurs after the submission. If the model is submitted on a Sunday or Monday, the Saturday occurs before the submission date.
The figure below shows the number of locations and weeks that each team has submitted forecasts for.
Each week, we will generate a leaderboard table to assess the interval coverage and relative weighted interval scores (WIS) of each model.
The weighted interval score is calculated to account for variation in the difficulty of forecasting different weeks and locations. The relative WIS is calculated using a pairwise approach to assess how accurate each model is compared to the baseline. Models with a relative WIS lower than 1 are more accurate than the baseline and models with a relative WIS greater than 1 are less accurate than the baseline is predicting the number of incident deaths.
For inclusion in this table, a team must have submitted a model every week within last 5 weeks and have submitted a forecast for every horizon (1 - 4 weeks ahead) for each week.
In the following figures, we have evaluated the average WIS for models across multiple forecasting weeks. The models included in this comparison must have submitted forecasts for all locations. The first figure shows the mean WIS across all locations for each submission week at a 1 week horizon. The second figure also shows the mean WIS aggregated across locations, however it is for a 4 week horizon.
To view a specific team, double click on the team name in the legend. To view a value on the plot, click on the point in the forecast of interest.
Finally, we have evaluated which locations teams had the lowest WIS scores for. In this figure, models were included if they submitted forecasts for all submission weeks and all horizons. The WIS scores stratified by location are included in each box. The color scheme shows the WIS score relative to the baseline.
The figure below shows the number of locations and weeks that each team has submitted forecasts for.
The number of models who submitted forecasts for incident cumulative deaths is 49. The number of models that submitted forecasts for all 5weeks was 46. The number of teams that submitted forecasts for all locations was 10.
The figure below shows the number of locations and weeks that each team has submitted forecasts for.